144        Bioinformatics

remember that the consequence may also be beneficial in some cases. For instance, it has

been reported that truncating variants in CARD9, IL23R, and RNF186 proteins may protect

against Crohn’s disease and ulcerative colitis and also truncating variants in ANGPTL4,

APOC3, PCSK9, and LPA proteins may protect against coronary heart disease [9, 10]. The

impact of variants on a protein is also measured by the number of isoforms produced by

the affected gene, the percentage of the protein affected, and moreover, we should put into

consideration that a frameshift may be bypassed by splicing or its impact may be avoided

by another frameshift. In the attempt to annotate genetic variants, the SNVs effect can be

predicted with high accuracy followed by small InDels (1–50 bp) and then medium InDels

(50–100 bp). It is also easy to predict the effect of missense SNV. The variant annotation

tools have several approaches to predict the effect of missense variant. For instance, they

can take into consideration the physicochemical properties of amino acids, whether the

variant is in a conserved region or not, or does it affect the three-dimensional structure of

the protein. A variant in a region conserved across the species or in a region of a secondary

or tertiary structure is more likely to be deleterious. Some tools use homology modeling

to simulate the structure of the new protein to predict the effect of variants and other tools

use machine learning utilizing multiple features to annotate the variants with the right

information and consequences. Figure 4.8 and Table 4.3 show a segment of a eukaryotic

genomic gene and possible variant annotations in each region.

In a typical genome-wide variant study, thousands of variants may be discovered. The

significance of these variants varies based on the type, location, and possible consequence.

FIGURE 4.8  Variant effect on gene regions.

TABLE 4.3  Gene Regions and Variant Effect

Region

Variant Effect

(1) Regulatory region including transcription

factor (TF) binding site

Deleterious variants

(2) Upstream gene region

Intergenic variants/upstream gene variant

(3) 5 UTR region

5 UTR variant

(4) Transcription start site (TSS)

Start retained or start lost variants

(5) Exon region

Exonic variants include missense, nonsense (stop gained),

frameshift, inframe insertion or deletion

(6) Splice donor region

Splice-site variant (exon loss, intron inclusion, altered

protein-coding sequence)

(7) Splice acceptor region

Splice-site variant (exon loss, intron inclusion, altered

protein-coding sequence)

(8) Intron region

Intronic variant

(9) Transcription termination site (TTS)

Stop lost, stop retained, incomplete terminal codon

(10) 3 UTR region

3 UTR variant

(11) Downstream gene region

Downstream gene variant